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The method of extended maximum likelihood is a well known concept of parameter estimation. 
One can implement external knowledge on the unknown parameters by multiplying the likelihood by 
constraint terms. In this note, we emphasize that this is also true for yield parameters in an extended 
maximum likelihood fit, which is widely used in the particle physics community. We recommend a 
way to generate pseudo-experiments in presence of constraint terms on yield parameters, and point 
to pitfalls inside the RooFit framework. 
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I. INTRODUCTION 

The concept of extended maximum likelihood (EML) is 
widely used for parameter estimation in particle physics. 
It is described in [1], and we shall summarize its main 
features here. In EML, the total number of events is 
regarded as a free parameter. Its best value is deter- 
mined by maximizing the likelihood function. The num- 
ber of observed events follows a probability density func- 
tion (PDF), typically a Poisson PDF. In some situations, 
the observed number of events is not the most efficient es- 
timator for the expected number of events. These situa- 
tions occur when there is at least one free parameter (or a 
combination of parameters) that simultaneously changes 
both shape and normalization of the PDF. Then, an EML 
fit is superior to a regular maximum likelihood (ML) fit. 
These genuine EML situations are labeled "type A" , fol- 
lowing the notation of pQ. 

The textbook example of a type A situation is that 
of an unknown signal over a known background of Nj, 
events. Suppose both signal and background are de- 
scribed by unit Gaussian PDFs G(x; fi, a = 1), then one 
possible (non-normalized) total PDF is 

g(x) = N s G(x; m = 0) + N b G(x; fi 2 = 0.5) , (1) 

with N s being the only free parameter. 

Besides the genuine EML situation, there are also 
"type B" EML situations (or "bogus", following 
again [I]), where both EML and ML give equivalent re- 
sults. This is the case when in Eq.[T] also Nb is a free 
parameter. Then we can rewrite 

g(x) = N(fG(m = 0) + (1 - /)G(,j 2 = 0.5)) , (2) 

with the total number of events N — N s + Nb and the 
signal fraction / = N s /N. Now / controls only the shape 
of the PDF, while N controls only the normalization. It 
might still be beneficial to formulate a problem using 
EML terms as in Eq.[TJ even if it truly is a type B prob- 
lem. This is because the ML notation from Eq. [2] quickly 
leads to less intuitive fraction parameters if more than 
one background component is present, while the yields 
of Eq.JTJare interpreted easily. 



The extended likelihood is formed by multiplying the 
classical likelihood by a Poisson term, 
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where N and iV bs are the number of expected and ob- 
served events, respectively, V is the total PDF, x is the 
vector of observables, A is the vector of parameters to be 
estimated. The constant factorial term (iV b s !) is usually 
omitted as it does not change the shape of — In C at its 
minimum. 

In the following we discuss how to include external 
constraints into the (extended) likelihood and review the 
effects of such terms. Then we describe a way to gener- 
ate pseudo ("toy") experiments, and demonstrate, that 
it will lead to unbiased results, if the correct pull statis- 
tic is chosen. We will show that this is still the case 
when constraints on yield parameters are present. At 
last, we will point out several pitfalls that are present 
in the toy experiment tools of a current version of the 
RooFit framework. 



II. CONSTRAINTS 

If there is knowledge available on the true value of a 
fit parameter, we can incorporate this knowledge into the 
fit procedure. For example, a previous experiment might 
have already measured the parameter at hand, and we 
have access to their published result, say A e ± a\ e . It 
is well known how to incorporate such constraints into 
maximum likelihood fits. The full likelihood function is 
multiplied by the constraint PDF C(A) (where A be a 
component of A) 



C c = C(A) x C 



(4) 



This holds also in the EML case, and also for constraints 
on yield parameters — even though the likelihood is not 
Poissonian anymore in the total yield, but contains the 
product of a Poisson term in the total yield and a non- 
Poissonian constraint term in a component yield. Often 
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a Gaussian distribution is assumed for C(A), 
1 ( (A e - A) 2 
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If more than one parameter is constrained, there can in 
principle be an external correlation between them. This 
external correlation is different from the internal one. It 
can easily be accounted for by, for example, replacing the 
single Gaussian of Eq.[5]by a multivariate one, 
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where V e is the known I x I external covariance matrix. 

We now recall two effects of including a constraint term 
for parameter A: they include external knowledge, and 
are a means of error propagation. 

Compared to a situation with a floating parameter and 
no constraint, including the constraint term will reduce 
the reported error on this parameter. Suppose that when 
A is left floating without constraint, the result be X u ± 
<7\ u , and with the constraint term included it be A c ±c7a c - 
If both the unconstrained likelihood and the constraint 
term are uncorrelatcd and Gaussian in A, the likelihood 
fit is equivalent to the weighted average of X u and A e . 
Thus the error will be given by 
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so that o\ c < <j \ u . 

Constraint terms are also a means of error propagation. 
If the likelihood depends not only on the fit parameters, 
but also on parameters that are fixed, one may want to 
propagate the errors of the fixed parameters into the fit 
result. This can be done by including constraint terms 
in the fixed parameters, and letting the previously fixed 
parameters float, too. If there are non-zero correlations 
between the previously fixed and the floating parameters, 
the errors on the latter will increase, reflecting the prop- 
agated uncertainty on the previously fixed parameters. 
The reported errors on the previously fixed parameters 
will in general be smaller than given by the constraint. 
This is because the dataset can also hold information on 
them. 

In addition to the above effects, constraint terms can 
also be incorporated to help the fit converge. When doing 
this, the errors are modified, for example as indicated by 
Eq.[7] This might spoil the interpretation of the reported 
fit errors as being "statistical" , if <j\ c is not statistical and 
also of same order as a\ u . 

The effects of constraints described above are not lim- 
ited to shape parameters. They also apply to normaliza- 
tion parameters such as the fraction parameters of Eq.[2] 
and the yield parameters of Eq.[T] But constraining frac- 
tion parameters is not equivalent to constraining yield 
parameters. If, for example, we know the rate of a back- 
ground process as a fraction of the rate of a control pro- 
cess, we should constrain this fraction. If, on the other 



hand, we know the absolute rate, we should constrain 
the yield. As pointed out above, the full likelihood is not 
required to be Poissonian in its yield parameters. Thus 
a Gaussian constraint on a Poissonian yield parameter is 
the correct implementation, even if the sum of a Gaussian 
and a Poissonian random variable does not follow a Pois- 
sonian PDF. The constraint term on a yield parameter 
can even have a width smaller than \/N. This happens, 
for example, when the constraint is derived from a large 
control yield Y ± VY by scaling down by a factor e that 
has no uncertainty: y e = eY ± sVY. In such situations, 
the constraint term will push the fit into the genuine 
type A EML regime. 



III. PSEUDO EXPERIMENTS 

Generating and fitting back a large number of pseudo 
experiments is a powerful tool to understand and validate 
a fit procedure. Pseudo experiments are generated by 
drawing a pseudo dataset from the full PDF, for example 
through a hit-and-miss algorithm. 

In an EML situation it is important that in the pseudo 
datasets the component event yields all fluctuate like a 
Poissonian. As a consequence, also the total yield fluctu- 
ates like a Poissonian, and each pseudo dataset contains 
a different number of events. Note that each yield must 
fluctuate independently, so that their ratios are not con- 
stant across the toy experiments. It is not enough that 
the total yield fluctuates like a Poissonian. 

If constraints are present, they have to be considered 
when generating and fitting a toy dataset. In particular, 
there is a "right" and a "wrong way" of doing it, as out- 
lined in Ref. [3j . The "right way" is to interpret the con- 
straint as stemming from an external measurement: We 
not only have to repeat our own measurement (by draw- 
ing events from the full PDF), but also have to repeat 
the external measurement by drawing from the constraint 
PDF. So each toy experiment will be performed with a 
different constraint term, but using the same shape for 
the total PDF. The "wrong way" is to fluctuate the total 
shape and not the constraint term, so that each exper- 
iment uses the same constraint term, but draws events 
from different total PDFs. This will lead to biased re- 
sults. 

If there are constraints present for yield parameters, 
their correct treatment in toy generation is still the above 
"right way" . This is even though the likelihood function 
does not only contain a Poisson term (the EML term), 
but also a generally non- Poissonian term (the constraint). 
Thus one might conclude, that the total yield should not 
be generated from a Poissonian, while this in fact is the 
case. 

Let us be more specific. Fixing the notation, we will 
denote for a parameter A, its true value as A t , its value 
as estimated by the fit as Af, its value as determined by 
an external measurement as A e , and a generated value 
as A'. Suppose the total PDF is that of Eq.[T] and we 
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add a Gaussian constraint on the background yield, cor- 
responding to an external measurement of N b>e i UN b „• 
Obviously the fit will be biased if we constrain a param- 
eter to anything else but its true value, so we'll assume 
N b , e = N b _ t (if Nb, e w as obtained from a genuine external 
measurement, this bias will likely go in the direction of 
the true value). To generate a toy experiment, we have 
to 

1. draw a value N' b from a Poissonian P(N; N btt ), and 
a value N' s from a Poissonian P(N; N Sit ), 

2. generate N£ background events from G(x; [i-i) and 
N' s signal events from G(x;ni), 

3. draw a toy constraint value N' h from the constraint 
PDF G(N b ;N b , e ,a Nb J. 

Then the likelihood to be maximized for this particular 
experiment is 
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where N' = N' s + and N = N s + N b is the expected 
total yield to be estimated by the fit. 

It is interesting to note, that the Poisson EML term 
is technically also a constraint. It constrains the fitted 
total number of events to the observed number of events. 
It also varies with the toy experiments, because the gen- 
crated "observed" number of events varies. 



IV. PULL DEFINITIONS 



Reference j2] defines a second pull statistic as 
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The square root is always defined as a consequence of 
Eq.[7j Ref. [2] points out that this definition may exhibit 
a slower convergence towards the unit Gaussian distribu- 
tion, i.e. for large number of events in the toy samples 
(not large number of toy experiments). However, the au- 
thors do not discuss constraints in the context of EML 
fits, and we found P2 to not follow a unit Gaussian even 
with sufficiently large samples. 
A third possibility is 
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where the generated constraint value X' e is used rather 
than the fixed A e . This definition is used in certain situa- 
tions by the RooFit framework [3], which we will discuss 
later. When generating toy experiments in the right way, 
we found that also P3 does not follow a unit Gaussian. 

Using pull definitions with different convergence rates 
comes with an additional complication: If a bias correc- 
tion is necessary in a situation with too few events for 
the limit to be valid, the correction will depend on the 
pull definition. 

In conclusion, we recommend to use p\ in combination 
with the right way of generating toy experiments. This 
combination gives unit pulls even if constraints on yield 
parameters are present. 



The pull statistic is defined as 
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where A/ ± a\. is the fit result of one particular pseudo 
experiment, and At is the true value. One expects the 
pull to follow a unit Gaussian, so from its observed dis- 
tribution one can draw conclusions about whether or not 
the fit reports unbiased central values and errors of cor- 
rect coverage. If the pull distribution has mean [i v ± <7 Mp 
and width w p ±a w that are not equal to and 1, respec- 



tively, 
biases: 



one can decide to correct the fit result for these 



Xf = Xf — f-ip(J \ f , 
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The pull formed with the corrected quantities then has 
mean and width 1. 

If a constraint to A e ± a\ s is present, the usual pull 
of Eq.[9] still follows a unit Gaussian, provided the toy 
experiments are generated in the "right way" as described 
in Section UTTl 



V. EXAMPLES 

Let us consider the following example. We will add 
to the scenario of Eq.[l] a third, low-yield Gaussian, to 
make the situation symmetric. The observable might 
represent an invariant mass of a reconstructed composite 
particle, and the low-yield Gaussians might correspond 
to backgrounds, in which a daughter particle was mis- 
reconstructed: 

g{x) =N s G(x; fJ , 1 = m = 140) 
-I- N b iG(x; (i b i — too + 2) 
+ N b2 G(x; fj, b2 — too 2) . (14) 

Each Gaussian has unit width. We will assume the 
true yields N s ,t = 500 for the signal, and each N b i, t = 
N b 2,t — 100 for the backgrounds. We consider Gaus- 
sian constraint terms for both backgrounds N b \ and N b 2 , 
G(N bi ; [i = N bi tl a = \jN bi t ). An example of such a 
pseudo experiment is shown in Fig. [T] 

In Figure [2] we show the pull distributions of each 
pull definition in Section [TVl using 5000 toy experiments. 
While the standard definition Eq.|9] is consistent with a 



unit Gaussian (fi p — —0.031 ±0.014, 



0.993±0.011) 
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Pull Definition (13) 
Pull Definition (12) 
Pull Definition (9) 
Unit Gaussian 




FIG. 1: Example toy experiment drawn from Eq. |14| with 
N s>t = 500 and N bht = N b2 , t = 100. 



FIG. 3: Distributions of the same pull definitions as described 
in the caption of Figure [2j but when performing toy experi- 
ments in the "wrong way" . 



the other definitions (12 13) are not. When enlarging 
the sample sizes to N t — 70 000, the distributions remain 
unchanged. 

In Figure [3] we show again the three pull distributions 
for generating and fitting the "wrong way" . Now defi- 
nitions Eq. |12| a nd Eq.[l3] follow a unit Gaussian, while 
definition Eq.[9] does not. However, this depends on the 
width of the constraint. These examples support our 
conclusion of Section HVl 



— - - Pull Definition (13) 

— Pull Definition (12) 

— Pull Definition (9) 

— Unit Gaussian 
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FIG. 2: Pull distributions according to Eq.[9] (blue), Eq 
(red), and Eq. 13 (green). Overlaid is a unit Gaussian 
(dashed) . 



To illustrate how a tight constraint term can push the 
fit into the type A EML regime, we now subsequently 
tighten the constraints on the N b i we observe in Figure [4] 
that the difference between fitted and generated total 
number of events Nf — N' can grow larger, as the con- 
straints get stronger. The widest distribution is reached 
at a value of about -y/TV^^. Then, deviations of up to 
« 10 events are possible, corresponding to ~ 1.4% of the 
events in the considered scenario. 
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FIG. 4: Distribution of the difference of fitted and generated 
total number of events Nf — N' for different values of the 
constraint width on the Nbi- &N bi e = 200 (red), 70 (green), 
10 (blue). 



VI. ROOFIT 

The RooFit framework [3] is widely used in experi- 
mental particle physics to implement sophisticated max- 
imum likelihood fits. It also features a mechanism to 
automate pull studies, RooMCStudy. We would like to 
point out several pitfalls present in RooFit version 3.5.4 
(bundled with Root version 5.34.00). 

There are two ways to configure RooMCStudy for 
the use with constraint terms. The first, using the 
Constrain () argument, is supposed to be used when 
the constraint term is part of the original PDF defini- 
tion. The second, using the ExternalConstraints () 
argument, should be used when the constraint terms are 
supplied separately. Both ways do not give identical re- 
sults. In the following, we refer to pulls obtained through 
RooMCstudy: :plotPull(). 

Using ConstrainO: RooMCstudy generates the 
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"wrong way" sketched in Section |III| This is particularly 
important if constraints on yield parameters are present. 
Then, RooFit first fluctuates the expected yield using 
the constraint term, and then again fluctuates the result 
using the EML Poisson term. As a consequence, the to- 
tal generated yield does not follow a Poissonian anymore. 
For the pull computation Eq.[l3]is used. If the width of 
a yield constraint is much larger than vTV, one expects 
results similar to those obtained in the unconstrained sit- 
uation. But in our example scenario, we observe a mod- 
erate bias of /j,p — 0.3. Also, the distributions of both the 
central value and the error are much wider compared to 
the unconstrained situation. This is shown in Figure [5] 
It can also happen, that the effects cancel by chance: 
In a second test scenario, corresponding to Eq.[T] with 
N S}t = 1000, N htt = 500, and G{N b - lf i = N b<t ,a = 150), 
we observed a unit Gaussian pull, while the error and 
central value distributions of Nt, were still too wide. 



~ 800 

CD 

£ 700 
600 
500 
400 
300 
200 
100 




"f 



•** *T 1 



40 50 
Error of 



FIG. 5: Error distribution (red) obtained from RooMCstudy 
when using the Constrain () approach and a wide constraint 
G(Nbi',H = Nu,t,& = Nbi.t)- Also shown is the error distri- 
bution in the unconstrained case (green). 

Another pitfall when using Constrain () is that if the 
generateAndFit () function is called in an EML sce- 
nario, and if one explicitly specifies the total number of 
events to be generated in the function call, then the pulls 
depend on the width of the constraint: For wide con- 
straints, the pull distribution will be too wide. This is 
illustrated in Figure [6j 

Using ExternalConstraints () : RooMCstudy gener- 
ates the "right way" , i.e. the component yields fluctuate 
like a Poissonian. But during fitting, always the same, 
fixed constraint is used, and the pull is computed using 
Eq-U As a consequence, the resulting pull distribution is 
too narrow. Thus, if the constraint is wide enough com- 
pared to y/N, the unconstrained situation is recovered. 
This is illustrated in Figure [7] 

Considering these difficulties it is clear that, in order 
to be able to conclude on a potential fit bias, the user 
needs a detailed understanding of RooMCstudy. 



n p = -0.229 ±0.015 
w p = 1.089 + 0.011 
H p = -0.661 ± 0.020 

1.425 ±0.140 




FIG. 6: Pull distribution (red) obtained from RooMCstudy 
when using the Constrain () approach and a wide constraint 
G(Nbi\ fj, — Nbi, t = 100, (7 — Nbi.t)- The other distribution 
(green) is for the same scenario, but obtained by explicitly 
stating N t = 700 in the generateAndFit () function call. 
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FIG. 7: Pull distribution obtained from RooMCstudy when 
using the ExternalConstraints () approach and a wide con- 
straint G(N bi ; (j, = Nbi,t = 100,(7 = 50) (green), a medium 
constraint (a = 10, red), and a narrow constraint (a = 1, 
blue). 



VII. CONCLUSION 

We have discussed the basic features of extended max- 
imum likelihood fits, and how to use constraint terms 
to incorporate external knowledge into these fits. If 
constraint terms are present, the generation of pseudo 
datasets requires care. We recommend to use the "right 
way" , in which the constraint is fluctuated in the gener- 
ation step and the PDF is not, and to use the usual pull 
definition. Then we find the pull to follow a unit Gaus- 
sian even if constraints on yield parameters are present. 

The authors wish to thank Niels Tuning for useful dis- 
cussion. 
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